1,875 research outputs found

    Authorship attribution using co-occurrence networks

    Get PDF
    Atribuição de Autoria utlizando Redes de Co-Ocorrencia Nesta tese é abordada a tarefa de Atribuição de Autoria como uma tarefa de classificação. As metodologias utilizadas representam textos em grafos. Destes, várias medidas são extraídas, sendo utilizadas como amostras para o classificador. Já existem alguns trabalhos que também se focam nesta metodologia. Esta tese foca-se num método que divide o texto em várias partes e trata cada uma como um grafo. Deste, são extraídas as medidas, que são tratadas como uma série temporal, da qual são extraídos momentos. Assim, os momentos compõem o vetor final, representativo de todo o texto. A partir da metodologia aqui descrita surgem mais duas variações. A primeira variação omite o passo das séries temporais, e, por consequência, as várias medidas de cada grafo são utilizadas diretamente como amostras. A segunda variação representa todo o texto como um só grafo. As metodologias são testadas com corpus em Inglês e Português, com número variado de textos; Abstract: Authorship Attribution using Co-Occurrence Networks This thesis approaches the task of Authorship Attribution as a classification task. This is done using methodologies that represent text documents in graphs, from which several measures are extracted, to be used as samples for the classifier. There have been some works that also focus on this methodology. This thesis focuses on a methodology which splits the texts in multiple parts and treats each as a separate graph, from which measures are extracted. Each graph’s measures are treated as a time-series and moments are extracted. These moments make the final vector, representative of the entire text. This methodology is explored and extended with 2 variations. The first variation skips the time-series step, resulting in the various measures from each graph being used directly as samples. The second variation models the entire text as one graph. The methodologies are tested in corpus in both English and Portuguese, with varying number of texts

    mCSM-AB: a web server for predicting antibody-antigen affinity changes upon mutation with graph-based signatures.

    Get PDF
    Computational methods have traditionally struggled to predict the effect of mutations in antibody-antigen complexes on binding affinity. This has limited their usefulness during antibody engineering and development, and their ability to predict biologically relevant escape mutations. Here we present mCSM-AB, a user-friendly web server for accurately predicting antibody-antigen affinity changes upon mutation which relies on graph-based signatures. We show that mCSM-AB performs better than comparable methods that have been previously used for antibody engineering. mCSM-AB web server is available at http://structure.bioc.cam.ac.uk/mcsm_ab.This is the final published version. It first appeared at http://nar.oxfordjournals.org/content/early/2016/05/23/nar.gkw458.full

    A general theory to estimate Information transfer in nonlinear systems

    Full text link
    A general theory for computing information transfers in nonlinear systems driven by deterministic forcings and additive and/or multiplicative noises, is presented. It extends the Liang-Kleeman framework of causality inference based on information transfer across system variables (Liang, 2016). An effective method of computing formulas of the rates of entropy transfers (RETs) is presented, the Causal Sensitivity Method (CSM), relying on the estimation from data of conditional expectations. Those expectations are approximated by nonlinear regressions, leading to a much easier and more robust way of computing RETs than the brute-force approach calling for numerical integrals over the phase space and the knowledge of the multivariate probability density function of the system. The CSM is furthermore fully adapted to the case where no model equations are available, starting with a nonlinear model fitting from data with the subsequent application of CSM to the fitted model. Moreover, the RETs are decomposed into sums of single one-to-one RETs plus synergetic terms, accounting for the joint causal effect of groups of variables. State-dependent RET formulas are also proposed, allowing for determining the dependencies of variables and synergies locally in phase space. A comparison of the RETs estimations is performed between the brute-force probability-density-based approach (AN), the CSM-based approach with and/or without model fitting, and the multivariate linear approach, in the context of two models: (i) a model derived from a potential and (ii) the classical chaotic Lorenz system, both forced by additive and/or multiplicative noises. The analysis demonstrates that the CSM estimations are robust and close to the AN-reference values in the different experiments, providing evidence of the possibilities offered by the method and opening new perspectives on real-world applications.Comment: 41 pages, 6 figures. Submitted to Physica

    CYCLOSA: Decentralizing Private Web Search Through SGX-Based Browser Extensions

    Get PDF
    By regularly querying Web search engines, users (unconsciously) disclose large amounts of their personal data as part of their search queries, among which some might reveal sensitive information (e.g. health issues, sexual, political or religious preferences). Several solutions exist to allow users querying search engines while improving privacy protection. However, these solutions suffer from a number of limitations: some are subject to user re-identification attacks, while others lack scalability or are unable to provide accurate results. This paper presents CYCLOSA, a secure, scalable and accurate private Web search solution. CYCLOSA improves security by relying on trusted execution environments (TEEs) as provided by Intel SGX. Further, CYCLOSA proposes a novel adaptive privacy protection solution that reduces the risk of user re- identification. CYCLOSA sends fake queries to the search engine and dynamically adapts their count according to the sensitivity of the user query. In addition, CYCLOSA meets scalability as it is fully decentralized, spreading the load for distributing fake queries among other nodes. Finally, CYCLOSA achieves accuracy of Web search as it handles the real query and the fake queries separately, in contrast to other existing solutions that mix fake and real query results

    Adapting Pretrained Language Models for Solving Tabular Prediction Problems in the Electronic Health Record

    Full text link
    We propose an approach for adapting the DeBERTa model for electronic health record (EHR) tasks using domain adaptation. We pretrain a small DeBERTa model on a dataset consisting of MIMIC-III discharge summaries, clinical notes, radiology reports, and PubMed abstracts. We compare this model's performance with a DeBERTa model pre-trained on clinical texts from our institutional EHR (MeDeBERTa) and an XGBoost model. We evaluate performance on three benchmark tasks for emergency department outcomes using the MIMIC-IV-ED dataset. We preprocess the data to convert it into text format and generate four versions of the original datasets to compare data processing and data inclusion. The results show that our proposed approach outperforms the alternative models on two of three tasks (p<0.001) and matches performance on the third task, with the use of descriptive columns improving performance over the original column names

    Biological interactions between nematophagous fungi, Esteya spp., and the pinewood nematode, Bursaphelenchus xylophilus

    Get PDF
    The pinewood nematode (PWN), Bursaphelenchus xylophilus, is a quarantine organism in several countries and the causal agent of pine wilt disease (PWD), a serious threat to pine forests worldwide. PWD results from complex interactions between the nematode, its insect vector, Monochamus spp., and host plants (conifers), being the nematode the common element in this interaction. The PWN is considered the sixth most economically important plant-parasitic nematode. In Europe, this pest was first reported in Portugal in 1999, in maritime pine, Pinus pinaster. Due to its economic importance and worldwide distribution, an enormous amount of effort is devoted to research on B. xylophilus and PWD. Scenarios strongly suggest that climate change is likely going to cause a spread of PWD and outbreaks in areas free of the disease. The urgent need for sustainable management strategies has led to an increasing interest in antagonists capable of suppressing the PWN. Nematophagous fungi belonging to the Esteya genus are reported as natural enemies of the PWN and promising biocontrol agents. There are currently two described species: E. vermicola and E. floridanum, the first of which is capable of mimicking volatile organic compounds produced naturally by Pinus spp. in order to attract PWN. However, few studies have been carried out on the development of Esteya spp. inside pine trees, and none using maritime pine, the main and most affected species in Portuguese forests and its largest carbon reservoir. It is therefore crucial to understand the plant-nematode-fungus interactions between P. pinaster, B. xylophilus and Esteya spp. In this sense, biological interactions between these two antagonists, the PWN and P. pinaster were investigated, namely fungus-fungus, fungus-nematode and fungus-tree, as well as feeding trials and chemotaxis assays, to determine the attractive power of both fungal species. These results will enlighten us on the most promising species for biocontrol and help us devise new ways to manage PWD
    • …
    corecore